NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

Wu, Young; Mcmahan, Jeremy; Chen, Yiding; Chen, Yudong; Zhu, Jerry; Xie, Qiaomin (July 2024, Proceedings of Machine Learning Research)

We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of a game and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm that solves a convex optimization problem with linear constraints and then performs random perturbation to obtain a modification plan with a near-optimal cost.
more » « less
Full Text Available
Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

Wu, Young; McMahan, Jeremy; Chen, Yiding; Chen, Yudong; Zhu, Xiaojin; Xie, Qiaomin (June 2024, Proceedings of the 41 st International Conference on Machine Learning)

Full Text Available
Exact Policy Recovery in Offline RL with Both Heavy-Tailed Rewards and Data Corruption

https://doi.org/10.1609/aaai.v38i10.29022

Chen, Yiding; Zhang, Xuezhou; Xie, Qiaomin; Zhu, Xiaojin (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

We study offline reinforcement learning (RL) with heavy-tailed reward distribution and data corruption: (i) Moving beyond subGaussian reward distribution, we allow the rewards to have infinite variances; (ii) We allow corruptions where an attacker can arbitrarily modify a small fraction of the rewards and transitions in the dataset. We first derive a sufficient optimality condition for generalized Pessimistic Value Iteration (PEVI), which allows various estimators with proper confidence bounds and can be applied to multiple learning settings. In order to handle the data corruption and heavy-tailed reward setting, we prove that the trimmed-mean estimation achieves the minimax optimal error rate for robust mean estimation under heavy-tailed distributions. In the PEVI algorithm, we plug in the trimmed mean estimation and the confidence bound to solve the robust offline RL problem. Standard analysis reveals that data corruption induces a bias term in the suboptimality gap, which gives the false impression that any data corruption prevents optimal policy learning. By using the optimality condition for the generalized PEVI, we show that as long as the bias term is less than the ``action gap'', the policy returned by PEVI achieves the optimal value given sufficient data.
more » « less
Full Text Available
Corruption-robust offline reinforcement learning

Zhang, Xuezhou; Chen, Yiding; Zhu, Xiaojin; Sun, Wen (January 2022, The 25th International Conference on Artificial Intelligence and Statistics)

Full Text Available
Robust policy gradient against strong data corruption

Zhang, Xuezhou; Chen, Yiding; Zhu, Xiaojin; Sun, Wen (January 2021, International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
Optimal attack against autoregressive models by manipulating the environment

https://doi.org/10.1609/aaai.v34i04.5760

Chen, Yiding; Zhu, Xiaojin (January 2020, AAAI Conference on Artificial Intelligence)

Full Text Available
Ionospheric Topside Diffusive Flux and the Formation of Summer Nighttime Ionospheric Electron Density Enhancement Over Millstone Hill

https://doi.org/10.1029/2021GL097651

Cai, Yihui; Yue, Xinan; Wang, Wenbin; Zhang, Shun‐Rong; Liu, Huixin; Lei, Jiuhou; Ren, Zhipeng; Chen, Yiding; Ding, Feng; Ren, Dexin (February 2022, Geophysical Research Letters)

Abstract Ionospheric F‐region electron density is anomalously higher in the evening than during the daytime on many occasions in the summer in geomagnetic mid‐latitude regions. This unexpected ionospheric diurnal variation has been studied for several decades. The underlying processes have been suggested to be related to meridional winds, topside influx arising from sunset ionospheric collapse, and other factors. However, substantial controversies remain unresolved. Using a numerical model driven by the statistical topsideO⁺diffusive flux from the Millstone Hill incoherent scatter radar data, we provide new insight into the competing roles of topside diffusive flux, neutral winds, and electric fields in forming the evening density peak. Simulations indicate that while meridional winds, which turn equatorward before sunset, are essential to sustain the daytime ionization near dusk, the topside diffusive flux is critically important for the formation and timing of the summer evening density peak.
more » « less

Search for: All records